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We introduce a strictly single-site DMRG algorithm based on the subspace expansion of the 
Alternating Minimal Energy (AMEn) method. The proposed new MPS basis enrichment method is 
sufficient to avoid local minima during the optimisation, similarly to the density matrix perturbation 
method, but computationally cheaper. Each application of H to IT) in the central eigensolver is 
reduced in cost for a speed-up of « {d+ l)/2, with d the physical site dimension. Further speed-ups 
result from cheaper auxiliary calculations and an often greatly improved convergence behaviour. 
Runtime to convergence improves by up to a factor of 2.5 on the Fermi-Hubbard model compared 
to the previous single-site method and by up to a factor of 3.9 compared to two-site DMRG. The 
method is compatible with real-space parallelisation and non-abelian symmetries. 


I. INTRODUCTION 

Since its introduction in 1993,^^^ the Density Matrix 
Renormalisation Group method (DMRG) has seen tre¬ 
mendous use in the study of one-dimensional systems .1^ 
Various improvements such as real-space parallelisation,^^ 
the use of abelian and non-abelian symmetries and 
multi-grid methodS have been proposed. Most 
markedly, the introductiorP of density matrix perturb¬ 
ation steps allowed the switch from two-site DMRG to 
single-site DMRG in 2005, which provided a major speed¬ 
up and improved convergence in particular for systems 
with long-range interactions. 

Nevertheless, despite some progress,l2Hii] (nearly) two- 
dimensional systems, such as long cylinders, are still a 
hard problem for DMRG. The main reason for this i s the 
different scaling of entanglement due to the area law F^ l is I 
In one dimension, entanglement and hence matrix di¬ 
mensions in DMRG are essentially size-independent for 
ground states of gapped systems, whereas in two dimen¬ 
sions, entanglement grows linearly and matrix dimen¬ 
sions roughly exponentially with system width. 

As a result, the part of the Hilbert space considered 
by DMRG during its ground state search increases dra¬ 
matically, resulting mainly in three problems: firstly, 
the DMRG algorithm becomes numerically more chal¬ 
lenging as the sizes of matrices involved grow (we will 
assume matrix-matrix multiplications to scale as O(m^) 
throughout the paper). Secondly, the increased search 
space size makes it more likely to get stuck in local min¬ 
ima. Thirdly, while sequential updates work well in 1-D 
chains with short-range interactions, nearest-neighbour 
sites in the 2-D lattice can be separated much farther in 
the DMRG chain. Therefore, improvements to the core 
DMRG algorithm are still highly worthwhile. 

In this paper, we will adopt parts of the AMEn 
methocff^ developed in the tensor train/numerical lin¬ 
ear algebra community to construct a strictly single-site 
DMRG algorithm that works without accessing the (full) 
reduced density matrix. Gompared to the existing center- 


matrix wavefunction formalism (GWF),li^ we achieve a 
speed-up of « (d -I- l)/2 during each application of H 
to |T) in the eigensolver during the central optimisation 
routine, where d is the dimension of the physical state 
space on each site. 

The layout of this paper is as follows: Section [H] will 
establish the notation. Section |III| will recapitulate the 
density matrix perturbation method and the GWF. Sec¬ 
tion |IV| will introduce the subspace expansion method 
and the heuristic expansion term with a simple two- 
spin example. The strictly single-site DMRG algorithm 
(DMRG3S) will be presented in Section alongside a 
comparison with the existing GWF. As both the original 
perturbation method and the heuristic sub space expan¬ 
sion require a mixing factor a,^ Section |Vl] describes how 
to adaptively choose a for fastest convergence. Numer¬ 
ical comparisons and examples will be given in Section 

Ivnl 

II. DMRG BASICS 

The notation established here closely follows the review 
article Ref. 3] Gonsider a state |T) of a system of I sites. 
Each site has a physical state dimension di, e.g. \/i : di = 
3, Z = 50 for a system of 50 S' = 1 spins: 

1^) = X C<Ti...<7jcri...crz) . (1) 

(Ji...ai 

In practice, the dimension of the physical basis is usually 
constant, Wi : di = d, but we will keep the subscript to 
refer to one specific basis on site i where necessary. 

It is then possible to decompose the coefficients 
Ccri,...,cri as a series of rank-3 tensors Mi,... ,M/ of size 
(di, rui-i, TOi) respectively, with mp = m; = 1. The coef¬ 
ficient can then be written as the matrix product 

of the corresponding matrices in Mi,..., M/: 

|T)= ^ . (2) 

(Tl...cri ' ^ 
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The maximal dimension m = max^ {rrii} is called the 
MPS bond dimension. In typical one-dimensional calcu¬ 
lations, m = 200, but for e.g. 32 x 5 cylinders, m > 5000 
is often necessary. It is in these numerically demanding 
cases that our improvements are of particular relevance. 

Similarly, a Hamiltonian operator can be written as a 
matrix product operator (MPO), where each tensor Wi is 
now of rank 4, namely {di,di,Wi-i,Wi): 

H= ^ Wr^---Wr^\a,...ai){r,...n\. (3) 

TI...TI 

w = maxi {wi} is called the MPO bond dimension. We 
will usually assume that for most i, mi = m and Wi = w. 
In practice, this holds nearly everywhere except at the 
ends of the chain, where the mi grow exponentially from 
1 to TO. The basis of Mi {Wi) of dimension TOi_i (wi_i) 
is called the left-hand side (LHS) basis, whereas the basis 
of dimension mi (wi) is the right-hand side (RHS) basis 
of this tensor. For simplicity, to^, di and Wi can also refer 
to the specific basis (and not only its dimension) when 
unambiguous. 

Instead of Mi, we will also write Ai (Bi) for a left 
(right) normalised MPS tensor: 

(4) 

(7i 

. (5) 

<Ti 

If we then define the contractions 

k = {Al^ ■■■At\^M^') &{di,...,d,,m,) (6) 

r, = (to,_i, ...,di) , (7) 

we can rewrite Itk) from ([^ as 

I*)" E ^^ri+ilCTi . . .(Ti) (g) |(Ti+i . . .CT;) . (8) 

<7l...<7i 

That is, when only considering one specific bond (z, z-l-l), 
the left and right MPS bases at this bond are built up 
from the states generated by the MPS tensor chains to 
the left and right of the bond. Individual elements of an 
MPS basis are therefore called “state”. 

Furthermore, define Lq = 1 and Li = Li_iA\WiAi 
with summation over all possible indices. Similarly, 
Ri+i = 1 and Ri = Ri+iBjWiBi. With these contrac¬ 
tions, it is possible to write 

(^-liJl^-) = L,_,MjW,M,R,+^ (9) 

for any i G [0, Z]. 

DMRG then works by sweeping over the system mul¬ 
tiple times. During each sweep, each site tensor Mi is 
sequentially updated once with each update consisting of 
one optimisation step via e.g. a sparse eigensolver and 
possibly one enrichment step during which the left or 


right MPS basis of Mi is changed in some way. Depend¬ 
ing on the exact implementation, updates may work on 
one (single-site DMRG) or two sites (two-site DMRG) 
at a time. The enrichment step may be missing or im¬ 
plemented via Density Matrix Perturbation or Subspace 
Expansion. 

III. PERTURBATION STEP AND 
CENTERMATRIX WAVEFUNCTION 
FORMALISM (CWF) 

A. Convergence Problems of Single-Site DMRG 

During single-site DMRG, only a single MPS tensor 
Mi on site i is optimised at once. Gompared to two- 
site DMRG, the search space is reduced by a factor of 
d « 2... 5, leading to a speed-up of at least 0{d) per 
iteration.!^ However, since the left and right bases of the 
tensors Mi are fixed and defined by the environment {U-i 
and Ti+i), this approach is likely to get stuck. While also 
occurring if there are no symmetries implemented on the 
level of the MPS, this issue is most easily visible if one 
considers U{1) symmetries.!^ assume that all basis states 
to the right of the RHS bond of Mi transform as some 
quantum number Sz- If we now target a specific sector, 
e.g. S '2 = 0 overall, then on the LHS of this bond (i.e. 
from the left edge up to and including Mi), all states 
must transform as — s^. In this configuration, it is im¬ 
possible for a local change of Mi to add a new state that 
transforms as, say, s)., to its right basis states, as there 
would be no corresponding state —si, to the right of that 
bond, rendering the addition of the state moot from the 
perspective of the local optimiser, as its norm will be zero 
identically. A concrete example of this issue is given in 
Section IVH AI 

DMRG is a variational approach on the state space 
available to MPS of a given bond dimension. As such, 
the algorithm must converge into either the global or a 
local minimum of the energy in this state space. Hence, 
we will call all cases where DMRG converges on an en¬ 
ergy substantially higher than the minimal energy achiev¬ 
able with the allowed MPS bond dimension cases where 
DMRG is stuck in local minima. 


B. Density Matrix Perturbation 

This convergence problem has been solved by White 
(2005).!^ In the following, we will assume a left-to-right 
sweep, sweeping in the other direction works similarly, 
but on the left rather than right bonds. After the local 
optimisation of the tensor Mi, the reduced density matrix 

p,^R = k_iM,Mjll_, ( 10 ) 

is built on the next bond. This is the reduced density 
matrix resulting from tracing out the part of the system 
to the left of bond {i,i + 1). 
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Pi^n is then perturbed as 

^ = Pi.-R + aTr . (11) 

The new p' ^ is then used to decide on a new set of 
basis states on the RHS of Mi, with the inverse mapping 
from the new to the old basis being multiplied into each 
component of -Bi+i. The mixing factor a is a small scalar 
used to control the perturbation. A new scheme to find 
the optimal choice of a is discussed in Section |vg 

C. Centermatrix Wavefunction Formalism (CWF) 


small cylinders. However, both the cost of the applica¬ 
tions of /f to I'l') as 0{w{(P + d)m^) as well as the large 
density matrix p S (dm, dm) cause problems if m and w 
become large. 

IV. SUBSPACE EXPANSION 

The idea of using subspace expan sion i nstead of dens¬ 
ity matrix perturbation originate j^^ l ^^ l in the tensor 
train/numerical linear algebra community. There, a 
stringent proof was given regarding the convergence 
properties of this method when the local tensor Zi of 
the residual 


In a standard single-site DMRG calculation, the re¬ 
duced density matrix pi^n is never used. More import¬ 
antly, even building p,/j on a given bond {i,i + 1) will 
not yield a density matrix that can be used in (111, as it 
only contains the states existing on that bond already 
without knowledge of the m^-i states on the bond one 
step to the left. In other words, it is not possible to 
choose the optimal set m) based only on rather, one 
requires also di and m^-i. 

The centermatrix wavefunction formalisiiJ^ was de¬ 
veloped to cope with this problem. Given a site tensor 
Mi G. (di, mi_i, mi) on a left-to-right sweep, it introduces 
a “centermatrix” Ci ,R e {dimi-i,mi) and replaces the 
original site tensor as 


M,,^ Ai e {d^, mi- 1 , d^mi^i) s.t. Mi = AiC^^R. (12) 

Ai is constructed to be left-orthogonal and is essentially 
an identity matrix mapping the left basis mi-i and the 
physical basis di onto a complete basis containing all 
dimi-i states on its right. The new basis is “complete” 
in the sense that all states reachable from the left bond 
basis mi_i and the local physical basis di are contained 
within it. 

The contents of Mi are placed in Ci^R accordingly and 
the original state remains unchanged. The reduced dens¬ 
ity matrix is then pi^R = Ci^rCI ^ and has access to all 
dimi-i states, as required above. A perturbation of pt^R 
according to hence allows the introduction of new 
states. 

The DMRG optimisation step can work on Ci^R alone, 
with Li built prior to optimisation of Ci^R from the 
expanded Ai. During each eigensolver step, the ef¬ 
fective Hamiltonian on site i has to be applied onto 
Ci^R. The application is done by contraction of Li S 
{w,dimi-i,dimi-i), Ri+i £ {w,mi,mi) and Ci^r £ 
{dimi-i,mi) at cost 0{w{d? + d)m^) per step. After op¬ 
timisation, the perturbation is added. Its computational 
cost is dominated by the calculation of OLLY{LiPi^RL\} at 
0{wd^m^). The bond between Ai and Ci^R, can then be 
truncated down to m using p) and the remaining parts 
of Ci^R are multiplied into Ri+i to the right. 

The resulting algorithm converges quickly for one¬ 
dimensional problems and performs reasonably well for 


\Z) = H\^) - E\^) = Z'[^---Zp\ai...ai) (13) 

is used as the expansion term. Here, we will only use the 
method of subspace expansion and substitute a numeric¬ 
ally much more cheaply available expansion term. 

The following section is divided into three parts: firstly, 
we will explain the concept of subspace expansion acting 
on two neighbouring MRS tensors Mi, Mi+i. Secondly, 
the expansion term employed in DMRG3S is introduced 
and motivated. Thirdly, a simple example is described. 


A. Subspace Expansion with an Arbitrary 
Expansion Term 


In the following, we will describe subspace expansion of 
the RHS basis of the current working tensor, as it would 
occur during a left-to-right sweep. 

Assume a state I'k) described by a set of tensors 
{Ai,...,Ai-i,Mi,Bi+i,...,Bi\. At the bond (z,i-|-l), 
we can then decompose the state as a sum over left and 
right basis states as in Eq. ([^. 

Now we expand the tensor Mi £ [d, mi-i, mi) by some 
expansion term Pi £ (d,mi-i,mp,) for each individual 
physical index component: 

^ Mp = [Mp Pp] . (14) 

This effectively expands the RHS MRS basis of Mi from 
mi to mi + mp.. Similarly, expand the components of 
Bipi £ (d, TOi, mi_|_i) with zeros: 


B 


CTi+i 

i+l 


^ B 


o-i+l 

i+l 


-^ 2+1 


(15) 


The appropriately-sized block of zeros only multiplies 
with the expansion term Pp. In terms of a decomposi¬ 
tion as in (|^, this is equivalent to 


Ivk) 




|cri...cri)® |cri+l,...,cr/) (16) 


where p is the result of multiplying li_i and Pi, with 
the 0 in the second expression similarly resulting from 
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the 0 in While the state I'h) remains unchanged, 

the local optimiser on the new site Bi+i can now choose 
the initially-zero components differently if so required: 
The necessary flexibility in the left-/right basis states to 
escape local minima has been achieved without referring 
to the density matrix. 

Note that while orthonormality of i3i+i is lost, we do 
not need it between the enrichment step on site i and 
the optimisation step on site i + 1. The orthonormality 
of Mi can be restored via singular value decomposition 
as usual. Furthermore, it is usually necessary to truncate 
the RHS basis of Mi down from rrii + mp. to m imme¬ 
diately following the expansion: this preserves the most 
relevant states of the expansion term while avoiding an 
exponential explosion of bond dimensions. 

When sweeping from right to left, the left rather than 
right MPS basis of the current working tensor is expan¬ 
ded, with the left tensor Ai-i being zero-padded as op¬ 
posed to the right tensor S^+i: 


Mf- 


Mf* = 


m: 

P7 


A"ri ^ A'^ri = [AZ~i" 0 ] 


(17) 

(18) 


new states and disturb the result of the eigensolver, which 
is optimal at this specific value of m but not an eigenstate 
of H yet. 


The cost of a single subspace expansion is 0(wdw? + 
w^(Pm?) for the calculation of Pi, potentially 0{2dwm?) 
for the addition to Mi and i?i+i respectively and 
0{dw^m^ + d^rn?) for the SVD of an {dm,wm) matrix 
formed from Mi. If we restrict the SVD to m singular 
values, then the resulting matrices will be of dimension 
{dm,m), (m,m) and (m,wm) respectively. The first can 
be reformed into Ai at cost 0{dm^) and the second and 
third multiplied into Si+i at cost 0{m^dw + m?d). The 
total cost of this step is dominated by the cost of the 
SVD at 0{dw^rn?), which is still cheaper than the calcu¬ 
lation of the perturbation term in (111, not considering 
the other costs associated to using the density matrix for 
truncation. 


C. Subspace Expansion at the Example of a 
d = Z = 2 Spin System 


B. Expansion Term 

Using the exact residual as the expansion term is com¬ 
putationally expensive: The term can be updated 

locally and is mostly unproblematic, but the subtraction 
of if 14') and subsequent re-orthonormalisation is costly 
and has to be done after each local optimisation, as the 
current value of E changes. This exact calculation is 
hence only possible for m « 100, which is far too small 
to tackle difficult two-dimensional problems. 

Instead, we propose the very cheaply available terms 

P^ = aLi^iMiWi e {di,mi-i,Wimi) (19) 


In the following, we will demonstrate and illustrate the 
method of subspace expansion at the simple example of 
a system of two spins with S = ^ from m = 1 to m = 2 
as it would occur during a left-to-right sweep. 

Assume the Hamiltonian 

H = Sl,S^,+SlSl + SlS^, ( 20 ) 

= l{SlS^+S\Sl}+SlS^, ( 21 ) 

with MPO-components 


to be used during left-to-right sweeps and Pi = 
aRi+iAIiWi for use during right-to-left sweeps with some 
scalar mixing factor a. In the regime where the exact 
residual can be computed, these terms work essentially 
equally well. 

This expression for Pi can be heuristically motivated 
as follows: (191 is equivalent to the partial projection of 
i7|4') onto 1^7 to the left of the current bond. Hence, in 
the ground state and ignoring numerical errors, the RHS 
basis of this Pi is identical to that of Mi. Truncation 
from rrii -|- mp^ to rPi is then possible without inducing 
errors. 

Numerically, it seems possible to choose a arbitrarily 
large without hindering convergence or perturbing the 
state too much in simple (one-dimensional) problems. 
However, if the chosen maximal bond dimension m is 
insufficient to faithfully capture the ground state of the 
given system, a has to be taken to zero eventually to 
allow convergence. Otherwise, Pi will continuously add 


Wi = 
W 2 = 


-^s+ 

Lv^ 


1 


1.72 




72 

72^^ 


5, 


Sr. 


( 22 ) 

(23) 


Let the initial state be an m = 1 MPS, described by 
components 


Al = [a] 

= [b] 


At = 

of _ 
^2 “ 


7i — 


(24) 

(25) 


where square brackets denote matrices in the MPS bond 
indices. Due to the standard normalisation constraints, 
there are only two free scalar variables here, a and b. 

Subspace expansion of Ai is straightforward (keep in 
















5 


mind that Lq = 1 for convenience): 


Pj = wI^AI + Wj^A\ 


Tt aI 


VT^ 

V2 

at _ u/A At 


0 a 


P^ = Wt'A[ + W^^Af 


0 


V2 


— \/l — 


(26) 

(27) 

(28) 

(29) 

(30) 


resulting in A!^ and B '2 directly after the expansion: 


The inner loop sweeps over the system, iterating over 
and updating the tensors on each site sequentially. Each 
local update during a left-to-right sweep (right-to-left 
sweeps work analogously) consists of the following steps: 

1. Optimise the tensor Mi. Use an eigensolver tar¬ 
geting the smallest eigenvalue to find a solution 
(M*, A*) to the eigenvalue problem 


U_lR^+lW^M, = AM, . (39) 

A* is the new current energy estimate. This first 
step dominates the computational cost. 


Af = 

a 

0 a\ 


II 

VI-a2 0 ^ -VI 

£>2 — 

1 1 

"O 0 0 0 

1_1 

_ 

£>2 — 

Vi — 

0 

0 

0 


(31) 

(32) 

(33) 


Normalising A{ via a singular value decomposition as 
A[ A'/S'Ul and multiplying SV^B^ B'^ gives: 


= [1 0 ] 
= [0 1 ] 


50 = 



yjl — 


ab 

2 — 

Vl — cBb 

Hi _ 

aVl — 


y/l—a 


Vl — a^\/l — iA 


0 

a 

V2 


a 

— Vl — a? 


(34) 

(35) 

(36) 

(37) 

(38) 


As expected, the final state |4>) = is 

still entirely unchanged, but there is now a one-to-one 
correspondence between the four entries of B 2 and the 
coefficients in the computational basis, mak¬ 

ing the optimisation towards ca = 0,Ci^j = 0 trivial. 


V. STRICTLY SINGLE-SITE DMRG 

We can now combine standard single-site DMRG (e.g. 
Ref. m p. 67) with the subspace expansion method 
as a way to enrich the local state space, leading to a 
strictly single-site DMRG implementation (DMRG3S) 
that works without referring to the density matrix at 
any point. 

With the notation from Section |llj the steps follow 
mostly standard single-site DMRG. In an outermost loop, 
the algorithm sweeps over the system from left-to-right 
and right-to-left until convergence is reached. Griteria 
for convergence are e.g. diminishing changes in energy or 
an overlap close to 1 between the states at the ends of 
subsequent sweeps. 


2. Build aPi according to (191 using M*. Build an 
appropriately-sized zero block O^+i after the dimen¬ 
sions of Pi are known. 


3. Subspace-expand M* —)■ M* with aPi and 
with Oi+i. 


4. Apply a SVD to M* and truncate its right basis to 
rrii again, resulting in A*. 

5. Multiply the remainder of the SVD (5Uf) into 
Bi+i —>■ i?i+i. 


6. Build Li from A*, Li_i and Wi. 

7. Galculate a new energy value after truncation based 

on Li, Wi+i and Ri+i- Use this energy value 

and A* to adapt the current value of a (cf. Sec¬ 
tion VI). 


8. Gontinue on site i + 1. 


Of these, step 2 and 3 implement the actual subspace 
expansion, whereas all others are identical to standard 
single-site DMRG. 

It is important to note that the only change from 
standard single-site DMRG is the addition of an en¬ 
richment step via subspace expansion. Therefore, this 
method doe s no t interfere with e.g. real-space p arall el- 
ised DMRG ,1^^ the use of nonabelian symmetrie d^ * ^^ ! or 
multi-grid methods.^ 

To analyse the computational cost, we have to take 
special care to ensure optimal ordering of the multiplic¬ 
ations during each eigensolver iteration in (391. The 


problem is to contract Li_ii?i_|_iIUiMi, with and 

Ri+i G (w,m,in), Wi S {d,d,w,w) and Mi G [d,m,m). 
The optimal ordering is then {{{Li-iMi)Wi)Ri+i)'. 

1. Gontract Li_i and Mi over the left MRS bond at 
cost 0{mw ■ m ■ dm = m^wd). 


2. Multiply in Wi over the physical bond of Mi 
and the left MPO bond at cost 0{n? ■ wd ■ dw = 
mAdAvcA). 


3. Finally contract with Ri+i over the right MPO and 
MPS bonds at cost 0{md ■ wm ■ m = m^dw). 
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Figure 1. (Colour online) Energies of the state at different 
points during a single update: Before optimisation, the state 
has some initial energy Ei. Local optimisation via the eigen- 
solver takes this energy down by AEo to Emin- Subsequent 
truncation causes a rise in energy by AEt with the Hnal value 
at the end of this update being Ef. 


The total cost of this procedure to apply H to Idi) 
is 0{2m^wd + (Pm'^w^). Assuming large (Pw/m is 
small, this gives a speed-up in the eigensolver multiplic¬ 
ations of (d-b l)/2 over the CWF approach, which takes 
0{m^wd{d + 1)). 


In addition to this speed-up, the subspace expansion 
is considerably cheaper than the density matrix perturb¬ 
ation. Since the perturbation/truncation step can often 
take up to 30% of total computational time, improve¬ 
ments there also have a high impact. At the same time, 
the number of sweeps at large m needed to converge does 
not seem to increase compared to the CWF approach (cf. 
Section VIII and sometimes even decreases. 


VI. ADAPTIVE CHOICE OF MIXING FACTOR 


Fig. displays the individual steps within a single up¬ 
date from the energy perspective: Let AEq denote the 
gain in energy during the optimisation step and let AEx 
denote the subsequent rise in energy during the trun¬ 
cation following the enrichment step. AEt p 0 only 
occurs if some enrichment (either via density matrix per¬ 
turbation or subspace expansion) has occurred, otherwise 
there would be no need for any sort of truncation. We 
can hence control the approximate value of AEt via a, 
which leads to a simple adaptive and computationally 
cheap algorithm: 

If AEt was very small or even negative (after chan¬ 
ging the optimised state by expansion of its right basis) 
during the current update, we can increase a during the 
next update step on the next site. If, on the other hand, 
\AEt\ « \AEo\, that is, if the error incurred during 
truncation nullified the gain in energy during the optim¬ 
isation step, we should reduce the value of a at the next 
iteration to avoid making this mistake again. 

In practice, it seems that keeping AEt ~ —O.SAAo 
gives the fastest convergence. Given the order-of- 
magnitude nature of a, it is furthermore best to in¬ 
crease/decrease it via multiplication with some factor 
greater/smaller than 1 as opposed to adding or subtract¬ 
ing fixed values. 

Some special cases for very small AEq (stuck in a 
local minimum or converged to the ground state?) and 
AEt > 0 or AEt < AEq have to be considered, mostly 
depending on the exact implementation. 

It is unclear whether there is a causal relation between 
the optimal choice of a and the ratio of AEt/AEq or 
whether both simply correlate with a proceeding DMRG 
calculation: at the beginning, gains in energy are large 
and a is optimally chosen large, whereas later on, energy 
decreases more slowly and smaller values of a are more 
appropriate. 

It is important to note that this is a tool to reach con¬ 
vergence more quickly. If one is primarily interested in a 
wavefunction representing the ground state, the calcula¬ 
tion of a new a at each iteration comes at essentially zero 
cost. If, however, the aim is to extrapolate in the trun¬ 
cation error during the calculation, then a fixed value for 
a is of course absolutely necessary. 


Both density matrix perturbation and subspace ex¬ 
pansion generally require some small mixing factor a to 
moderate the contributions of the perturbation terms. 
The optimal choice of this a depends on the number 
of states available and those required to represent the 
ground state, as well as the current speed of convergence. 
Too large values for a hinder convergence by destroying 
the improvements made by the local optimiser, whereas 
too small values lead to the calculation being stuck in 
local minima with vital states not added for the reasons 
given in Section |IIIB| The correct choice of a hence af¬ 
fects calculations to a large degree, but is also difficult to 
estimate before the start of the calculation. 


VII. NUMERICAL EXAMPLES 
A. DMRG Stuck in a Local Minimum 

In this sub-section, we will give a short example of how 
DMRG can get stuck in a local minimum even on a very 
small system. Gonsider 20 S = ^ spins with isotropic 
antiferromagnetic interactions and open boundary con¬ 
ditions. The U{1) symmetry of the system is exploited 
on the MRS basis, with the overall Sz forced to be zero. 
The initial state is constructed from 20 linearly independ¬ 
ent states, all with 3 sites on the very right at Sz = 0.5 
and TO = 20 in total. The quantum number distribution 
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Bond Index i 

Figure 2. (Colour online) The quantum number distribution 
as counted from the right at each bond of a Z = 20 system with 
5=1 and 5'*°*^* = Q. The artificial input state is shown with 
black circles. Two DMRG calculations have then been done 
on this input state, once with no enrichment term {a = 0, red 
squares) and once with subspace expansion enabled {a ^ 0, 
blue diamonds). It is clearly visible that without enrichment, 
DMRG3S can reduce some weights to zero, but cannot add 
new states - red only occurs together with black. As soon 
as enrichment is enabled, DMRG3S restores iSz symmetry 
and reflective symmetry over the bond and finds a much 
better ground state. 

at each bond is plotted in Fig. |^as black circles. 

DMRG3S is run with subspace expansion disabled, i.e. 
a = 0 throughout the calculation. The algorithm “con¬ 
verges” to some high-energy state at = —6.35479. 

The resulting quantum number distribution (red squares 
in Fig. 1^ shows clear asymmetry both between the left 
and right parts of the system and the +Sz and —Sz sec¬ 
tors at any given bond. It is also visible that while some 
states are removed by DMRG3S without enrichment, it 
cannot add new states: the red squares only occur to¬ 
gether with the black filled circles from the input state. 

If we enable enrichment via subspace expansion, i.e. 
take a ^ 0, DMRG3S quickly converges to a much bet¬ 
ter ground state at = —8.6824724. The quantum 

numbers are now evenly distributed between the left- and 
right parts of the system and ±^2 symmetry is also re¬ 
stored. 


B. Application to Physical Systems 

In the following subsections, we will compare the two 
single-site DMRG algorithms GWF and DMRG3S when 
applied to four different physical systems: a 5 = 1 Heis¬ 
enberg spin chain with periodic boundary conditions, a 


bosonic system with an optical lattice potential, a Fermi- 
Hubbard model at 17 = 1 and quarter-filling and a system 
of free fermions at half-filling. 

Each algorithm is run at three different values of 
m = rrimaxj 'm'maxl‘2, 'tn^ax!^ from the same initial state 
and run to convergence. This way, it is possible to both 
observe the behaviour of the methods at low and high 
accuracies. 

The usual setup in DMRG calculations of starting at 
small TO and increasing to slowly while the calculation 
progresses makes it unfortunately very difficult to com¬ 
pare between the three methods. This is because differ¬ 
ent methods require different configurations to converge 
optimally. We therefore restrict ourselves to fixed to 
throughout an entire calculation, even though all meth¬ 
ods could be sped up further by increasing to slowly dur¬ 
ing the calculation. 

Errors in energy compared to a numerically exact ref¬ 
erence value Eq are plotted as a function of sweeps and 
GPU time. It should be stressed that this error in en¬ 
ergy is not directly comparable to the truncation error 
traditionally used in two-Site DMRG or the variance 
{tP') — (77)^ sometimes considered in single-site DMRG. 
Even small differences in energy can lead to vastly dif¬ 
ferent physical states and reaching maximal accuracy in 
energy is crucial to ensure that the true ground state has 
been reached. 

Furthermore, a traditional two-site DMRG (2DMRG) 
calculation without perturbations is done and its error 
in energy and runtime to convergence is compared to the 
two single-site algorithms. Here, convergence is defined 
as a normalised change in energy less than 10“® (for to = 
TTimax) resp. 10“® (for TO < rumax)- The runtime to 
convergence is the GPU time used until that energy was 
output by the eigensolver for the first time. 

All calculations were performed on a single core of a 
Xeon E5-2650. 


1. 5=1 Heisenberg Chain 

Firstly, we consider a 5 = 1 Heisenberg spin chain 
with I — 100 sites and periodic boundary conditions im¬ 
plemented on the level of the Hamiltonian as a simple 
link between the first and last site: 

100 

H = ^ Si ■ -S(i+i)%ioo • (40) 

i=l 

C/(l) symmetries are exploited and the calculations are 
forced in the ^2 = 0 sector. 

This system is of particular interest as, firstly, it is one 
of the standard benchmarking systems with well-known 
analytic values for the ground-state energy. Secondly, 
it is a one-dimensional system where the case of peri¬ 
odic boundary conditions can still be tackled by DMRG. 
The larger MPO bond dimension resulting from these 







Figure 3. (Colour online) Spin Chain Eq. ( [40| : Normal¬ 
ised error in energy as a function of sweeps (left) and CPU 
time used (right) of the two single-site algorithms at different 
m = 200,400,800. DMRG3S shows both a speed-up and an 
improved convergence per sweep compared to CWF, with a 
long tail of slow convergence very visible for CWF at high 
accuracies. 


Table I. Spin Chain Eq. (40l: Normalised error in energy at 
convergence and runtime to convergence of all three meth¬ 
ods. DMRG3S is consistently faster than CWF, whereas the 
energies provided by 2DMRG are not comparable in accuracy. 



m = 200 

m = 400 

m = 800 

DMRG3S Energy Error 

2.1 X 10"® 

1.0 X 10"'^ 

7.1 X 10~® 

CWF Energy Error 

2.8 X 10"® 

1.7 X 10"'^ 

7.1 X 10“® 

2DMRG Energy Error 

1.1 X 10"® 

8.6 X 10"'^ 

1.0 X 10~'^ 

DMRG3S Runtime 

583 s 

1935 s 

3990 s 

CWF Runtime 

1519 s 

2695 s 

11133 s 

2DMRG Runtime 

762 s 

3181 s 

21963 s 


PBC similarly arises during the simulation of quasi two- 
dimensional systems as cylinders. The same applies 
to the non-nearest-neighbour interactions in this system 
(between the first and last site) and cylindrical systems. 

Fig.i compares the error in energy with respect to 
the reference value Eq = —140.148 404 for DMRG3S and 
CWF for m = 200,400,800 as a function of sweeps and 
computation time. 

During the first three to four sweeps, DMRG3S ex¬ 
hibits a smaller convergence rate per sweep, however, 
compared to the first sweeps of CWF, they also cost 
negligible CPU time. Afterwards, DMRG3S offers com¬ 
parable (at medium accuracies) or much improved (at 
high accuracies) convergence rate per sweep as compared 
to CWF together with a still reduced average runtime 
per sweep. Combined, these effects lead to a speed-up 


of 2.6,1.3 and 2.7 for m = 200,400 and 800 respect¬ 
ively between CWF and DMRG3S when considering the 
runtime to convergence. 

In comparison, the 2DMRG algorithm does not handle 
the periodic boundary conditions well and yields energies 
higher than the single-site algorithms with perturbations 
(cf. Tab. |T|. Runtime to convergence is hence not com¬ 
parable. 


2. Dilute Bosons on an Optical Lattice 

We carry on to study bosons in a modulated potential 
of 10 unit cells, each with 16 sites. The cutoff for local 
occupation numbers is Umax = 5, resulting in a local site 
dimension of d = 6. The Hamiltonian is given as 

H = + jcos^ 16 ~ 

159 

i=l 

This system should be fairly easy for DMRG to handle, 
as there are only nearest-neighbour interactions. How¬ 
ever, the large-scale order due to the modulated poten¬ 
tial and a very small energy penalty paid for an uneven 
distribution of bosons was observed to cause badly con¬ 
verged results.l^ Manual checks of the states returned by 
each method were hence done to ensure a proper, equal 
distribution of bosons throughout the whole system. 

The state is initialised with n = 80 bosons in total. We 
allow m = 50,100, 200 states and use the energy reference 
value Eq = —103.646 757. All algorithms converge to this 
value at m = 200. 

Fig. compares CWF and DMRG3S whereas Tab. 
additionally lists 2DMRG. Since the bond dimensions are 
relatively small, we do not expect a speed-up from faster 
numerical operations. Instead, the improved convergence 
behaviour per sweep is responsible for the speed-up of 2 
of DMRG3S over CWF at small m. At larger m, CWF 
converges better, but numerical operations also become 
cheaper for DMRG3S for a speed-up of 2 again. 

As there are no long-range interactions, 2DMRG also 
fares well with regard to energy accuracy. However, it 
takes longer to converge than the single-site methods es¬ 
pecially at large m, mainly because the eigenvalue prob¬ 
lem in two-site DMRG is of dimension d larger than in 
single-site DMRG. A comparison between DMRG3S and 
2DMRG leads to a speed-up of up to 3.3 for the case of 
m = 200. 


3. Fermi-Hubbard Model 

As a third example, substantially more expensive cal¬ 
culations are carried out for a substantially stronger en- 
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Figure 4. (Colour online) Bosonic System Eq. (41l: Normal¬ 
ised error in energy from CWF and DMRG3S as a function of 
sweeps (left) and CPU time used (right) for m = 50,100, 200. 
Again, an improved convergence behaviour at high accuracies 
can be observed, in particular at smaller values of m. The 
small bond dimensions lead to a smaller speed-up due to faster 
numerical operations, which only becomes visible at m = 200. 


Figure 5. (Colour online) Fermi-Hubbard Eq. (42l: Normal¬ 
ised error in energy from DMRG3S and CWF as a function 
of sweeps (left) and CPU time used (right) for different bond 
dimensions m = 300, 600,1200. The same basic behaviour 
as for the previous systems is repeated, with both improved 
convergence behaviour at high accuracies and faster numerical 
operations. 


Table IF Bosonic System Eq. (411: Normalised error in en¬ 
ergy at convergence and runtime to convergence of all three 
methods. DMRG3S is again the fastest method with a very 
constant speed-up of 2 over GWF and up to 3.3 over 2DMRG. 



TO = 50 

TO = 100 

TO = 200 

DMRG3S Energy Error 

2.9 X 10“® 

4.8 X 10“® 

< 10“® 

CWF Energy Error 

2.3 X 10~® 

3.9 X 10“® 

05 

1 

O 

V 

2DMRG Energy Error 

1.9 X 10“® 

2.8 X 10“® 

A 

o 

1 

DMRG3S Runtime 

124 s 

171 s 

469 s 

CWF Runtime 

260 s 

397 s 

951 s 

2DMRG Runtime 

210 s 

462 s 

1550 s 


tangled Fermi-Hubbard model of 100 sites with Hamilto¬ 
nian 

too 


Both C/(l)charge ^nd U{l)sz Symmetries are employed, 
with 50 fermions and = 0 enforced through the 

choice of initial state. Together with the free fermi¬ 
ons from the next section, we can use this system to 
study how criticality and increased entanglement affect 
the three methods. 

Calculations are done for m = 300,600,1200. All 
methods converge to the same value Eq = —84.255 525 4 
at large m. 

Fig. § compares the two single-site methods while 





+ > • (42) 


■=t.4 


Table III. Fermi-Hubbard Eq. (42|: Normalised error in en¬ 
ergy at convergence and runtime to convergence of all three 
methods. Accuracies are comparable between the different 
methods, but runtimes vary greatly. 



TO = 300 

TO = 600 

TO = 1200 

DMRG3S Energy Error 

1.5 X 10“® 

7.5 X 10"® 

05 

1 

O 

V 

CWF Energy Error 

1.5 X 10“® 

7.6 X 10"® 

A 

1 —' 

o 

1 

2DMRG Energy Error 

1.3 X 10“® 

6.4 X 10"® 

< 10"® 

DMRG3S Runtime 

474 s 

1367 s 

3955 s 

CWF Runtime 

1215 s 

3917 s 

10122 s 

2DMRG Runtime 

727 s 

2950 s 

15596 s 


Tab. |III| summarises all three DMRG implementations. 
Since the system only exhibits local interactions, 2DMRG 
fares well and all methods generally provide compar¬ 
able energies. The difference is therefore in the runtime 
needed to achieve these energies. Compared to CWF, 
DMRG3S achieves a speed-up of « 2.6 consistently at 
all TO, as the smallest to = 300 is already large enough 
to justify the assumption cPw <?; to in the speed-up of 
numerical operations. In particular, it continues to con¬ 
verge quickly at high accuracies whereas CWF develops 
a long tail of slow convergence. The speed-up compared 
to 2DMRG is smaller at lower values of to, but increases 
to 3.9 at TO = 1200. 
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Figure 6. (Colour online) Free Fermions Eq. ( |43[ |: Norm¬ 
alised error in energy from CWF and DMRG3S as a func¬ 
tion of sweeps (left) and CPU time used (right) at m = 
300, 600,1200. CWF again exhibits a long tail of slow con¬ 
vergence while DMRG3S converges quickly at all m and all 
accuracies. 


is the same as for the Fermi-Hubbard system, namely 
m = 300,600,1200. We used Eq = —126.602 376 as 
the reference value, since all methods converged to this 
ground-state energy at m = 1200. 

The results in Tab. |lV| and Fig. [^mostly follow the pre¬ 
vious results for locally interacting systems: Accuracies 
of all methods are essentially identical, whereas time to 
convergence varies between the methods. At small m, 
there are some speed-ups of DMRG3S over CWF, largely 
due to better convergence behaviour per sweep, whereas a 
significant advantage of DMRG3S becomes visible at lar¬ 
ger m, when numerical operations become cheaper com¬ 
pared to the CWF method. Correspondingly, the speed¬ 
up from CWF to DMRG3S increases from 1.6 at m = 300 
to 2 at m = 1200. 

Similarly, the larger numerical cost of two-site DMRG 
becomes more noticeable at larger m, with the speed-up 
between 2DMRG and DMRG3S increasing from 1.5 at 
m = 300 to more than 6 at m = 1200. 

Compared to the non-critical Fermi-Hubbard system 
from Section |V11 B 3| we observe larger errors in energy 
at fixed m, as expected. Correspondingly, as more eigen¬ 
values contribute significantly, convergence of both the 
eigenvalue solver and the singular value decompositions 
becomes slower, leading to a slow-down of all three meth¬ 
ods. 


Table IV. Free Fermions Eq. 431: Normalised error in energy 
at convergence and runtime to convergence of all three meth¬ 
ods. 



m = 300 

m = 600 

m = 1200 

DMRG3S Energy Error 

5.0 X 10“® 

2.8 X lO"’’ 

05 

1 

0 

V 

CWF Energy Error 

3.8 X 10“® 

2.8 X lO"’’ 

05 

1 

0 

V 

2DMRG Energy Error 

3.7 X 10“® 

2.6 X lO"’’ 

A 

0 

1 

DMRG3S Runtime 

533 s 

1452 s 

4643 s 

CWF Runtime 

863 s 

2590 s 

9586 s 

2DMRG Runtime 

794 s 

4584 s 

29698 s 


4 . Free Fermions 


Finally, we consider a model of free fermions on a chain 
of 100 sites with Hamiltonian 


100 


H = - 


E E [«l 

1=1 <T=t,4 


C-t-l,a' h.C. 


(43) 


The maximally delocalised wavefunction found in the 
ground-state of this system is notoriously difficult for 
MPS formats in general to reproduce faithfully. At the 
same time, most other parameters are identical (d, I, 
m) or very close (tu) to those in the Fermi-Hubbard 
model from Section TVII B 31 The calculation is done us¬ 
ing C/(l)f.harP'p and U(l)fi 7 symmetries at half-filling with 
N = 100 fermions and = 0. The choice of m 


VIII. CONCLUSIONS 

The new strictly single-site DMRG (DMRG3S) al¬ 
gorithm results in a theoretical speed-up of ^ (d -|- l)/2 
during the optimisation steps compared to the center- 
matrix wavefunction formalism (CWF), provided that 
(Pw/m is small. Further, convergence rates per sweep are 
improved in the important and computationally most ex¬ 
pensive high-accuracy/large-TO phase of the calculation. 
In addition, auxiliary calculations (enrichment, normal¬ 
isation, etc.) are sped up and memory requirements are 
relaxed. 

Numerical experiments confirm a speed-up within the 
theoretical expectations compared to the CWF method. 
The efficiency of single-site DMRG in general compared 
to the traditional two-site DMRG was substantiated fur¬ 
ther by a large speed-up at comparable accuracies in en¬ 
ergy. 
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